Chinese Terminology Extraction Using Window-Based Contextual Information

نویسندگان

  • Luning Ji
  • Mantai Sum
  • Qin Lu
  • Wenjie Li
  • Yi-Rong Chen
چکیده

Terminology extraction is an important work for automatic update of domain specific knowledge. Contextual information helps to decide whether the extracted new terms are terminology or not. As extraction based on fixed patterns has very limited use to handle natural language text, we need both syntactical and semantic information in the context of a term to determine its termhood. In this paper, we investigate two window-based context word extraction methods taking into account of syntactic and semantic information. Based on the performance of each method individually, a hybrid method which combines both syntactical and semantic information is proposed. Experiments show that the hybrid method can achieve significant improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NLP Techniques for Term Extraction and Ontology Population

This chapter investigates NLP techniques for ontology population, using a combination of rule-based approaches and machine learning. We describe a method for term recognition using linguistic and statistical techniques, making use of contextual information to bootstrap learning. We then investigate how term recognition techniques can be useful for the wider task of information extraction, makin...

متن کامل

A Comparative Study of the Effect of Word Segmentation On Chinese Terminology Extraction

Automatic term extraction is the first step towards automatic or semi-automatic update of existing domain knowledge base. Most of the researches applied word segmentation as a preprocessing step to Chinese term extraction. However, segmentation ambiguity is unavoidable, especially in identifying unknown words for Chinese. In this paper, we discuss the effect and limitations of segmentation to C...

متن کامل

تصدیق امضای پویا و احراز هویت مبتنی بر استخراج نقاط غالب پایدار و تقطیع الگوهای امضا

One of the basic problems in signature verification is variability and differences apparent on patterns of signature even for an individual. Signature segmentation to basic components, in addition to the access to the stable features, the hidden differences are revealed between genuine and forgery patterns. In this paper, signature patterns of two-dimensional are segmented by using dominant poi...

متن کامل

Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures

Word extraction is one of the important tasks in text information processing. There are mainly two kinds of statisticbased measures for word extraction: the internal measure and the contextual measure. This paper discusses these two kinds of measures for Chinese word extraction. First, nine widely adopted internal measures are tested and compared on individual basis. Then various schemes of com...

متن کامل

Identifying Contextual Information for Multi-Word Term Extraction

Methods for multi-word term extraction have traditionally involved statistical techniques. More recently, hybrid techniques have been evolving which incorporate some linguistic knowledge. This information is generally very shallow, and researchers have tended to ignore any real understanding of either terms or the context in which they appear. We adopt an approach which uses a variety of knowle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007